What will this graph look like?
30/5/2018
What will this graph look like?
v1=rnorm(20)
v2=rnorm(20,mean=50,sd=10)
v3=factor(rep(c("cat","dog","fish","lizard"),times=5))
v4=factor(rep(c("brown","black","red","white"),each=5))
df1=data.frame(v1,v2,v3,v4)
par(mfrow=c(1,5)) plot(v1, main="Only v1") # note index plot(v1,v2,main="v1, v2") plot(v3,v1, main="v3, v1") plot(v3~v1, main="v3 ~ v1") plot(v3,v4,main="v3, v4")
plot(df1,main="data frame")
Lots of grammars, but most people think of grammar in terms of language.
A Definition:
The structural rules governing the syntax and composition of clauses, phrases and words. Words can be broadly classified as
Adjective (descriptive),
Noun (objects)
or ,
Verb (actions).
  Â
"Let's eat Grandma"
Â
vs.
Â
"Let's eat
, Grandma"
 Â
Adjectives order: opinion-size-age-shape-colour-origin-material-purpose Noun
 lovely little old rectangular green French silver whittling knife
 vs.
 French old little green silver whittling lovely rectangular knife   Â
Source: qz.com/773738/how-non-english-speakers-are-taught-this-crazy-english-grammar-rule-you-know-but-youve-never-heard-of/
Why is a grammar of graphics important?
To understand, discuss and think deeply about graphs and their components we need a common language and thus a graphic grammar to facilitate this.
"The Grammar of Graphics" by Leyland Wilkinson, Annad & Grossman (2005).
A graph is an abstract idea.
It becomes a graphic through the process of rendering.
A graph is made up of an aggregate of layered collections of components. Each layer has its own aesthetic properties that can be used to further describe the layer.
Leyland proposed that there were three stages moving from data to a final graphic.
The key stage was "Specification" as computers automated Assembly & Display.
A change in a graph specification results in a different graphic being rendered.
| ID | Specification | Description | Example |
| 1 | DATA | A set of data operations that create variables from datasets | |
| 2 | TRANS | Variable transformations | Rank, bin |
| 3 | SCALE | Scale transformations | Log |
| 4 | COORD | A coordinate system | Cartesian, polar |
| 5 | ELEMENT | Graphs | Points, line |
| Aesthetic attributes | Colour, transparency, label | ||
| 6 | GUIDE | One or more guides | Axes, legends |
Similarities: All use 'Mammals Sleep' dataset and variable vore used (DATA).
A count of the variable 'vore' (TRANS) and same colours aesthetic (ELEMENT) used with legend (GUIDE).
Differences: (A,B,C) use a Cartesian coordinate system. (D) uses a polar coordinate system (COORD). (C) uses a flipped axes to (A,B). Each graphic's title description (GUIDE).
R and it's predecessor language, S, originated in the late 1970's. The existing R graphic systems (such as Base, Grid & Lattice) were built on the ideas of their day.
Enter ggplot2.
GGPlot2 package developed by Hadley Wickham.
Influenced by philosophy of Leland's 'Grammar of Graphics'.
ggplot's specifications syntax, particularly for ELEMENTS, is far richer.
Adopted modern ideas such as object oriented programming, pipes, and a new programming paradigm that had not been widely used before in the R eco system.
Consistent syntax and naming conventions.
Programming, or writing code in R is how we instruct the computer to complete the assembly and display stage of a graphic's construction.
Two Programming paradigms:
Imperative - Code step by step to create a graphic. Complicated graphics require programming techniques such as for loops.
Declarative - You describe and code what your graph actually is.
# Imperative;
plot(awake)
# Plot with lines and points # Alternate method
plot(awake,type="p",col="red") plot(awake,type="l",col="blue")
lines(awake, col="blue") points(awake,col="red")
# Declarative
ggplot(data=msleep,aes(x=seq(1,length(awake)),y=awake)) +
geom_point(colour="red") +
geom_line(colour="blue")
ggplot2's far richer specifications.
A plot in ggplot can be described a
Plot
= (Data + Mapping) + Layer
where a
Layer
= Geom + Geom(Parameters) + Stat + stat:parameters + Position
Each graphic contains at a minimum Data, Geometric object (Geom), Statistical transformation (Stat), Scales and a Co-ordinate System.
A graphic can have multiple layers.
In practice each geom has a default statistic associated with it and vice versa.
ggplot2 has sensible defaults built in.
ggplot2 code can written / used in 3 main ways.
Layers are added by use '+' to link.
Consistent syntax for usage "specification_description" eg geom_point()
The key to ggplot2's success.
Aesthetics often shortened to 'aes' refers to both mapping and the visual aspect of a graph.
Aesthetic describes how variables are mapped or assembled into the graphs coordinates system.
Aesthetics also include colour, scale, size, fill and shape and transparency (called alpha).
Aesthetics allow additional information to encoded into plots.
Allows user to recognise differences between discrete & continuous data.
Mapping & aesthetics can be 'inherited' or passed through to each layer automatically if described in initial mapping aesthetic.
# A function
ggplot(mtcars,aes(x=awake, y=sleep_rem)) +
geom_point()
# B object; Note inheritance of colour
g = ggplot(msleep,aes(x=awake, y=sleep_rem,colour=vore.f))
g + geom_point()
# C pipes
msleep %>% ggplot(.,aes(x=awake, y=sleep_rem)) + geom_point(shape=15,aes(colour=brainwt))
g = ggplot(mtcars,aes(y=wt,x=mpg,colour=gear)) g + geom_point()
g + geom_point() + # Recall that g = ggplot(mtcars,aes(y=wt,x=mpg,colour=gear)) geom_smooth() # note
g + geom_point(shape=15) + geom_smooth(colour="green",fill="blue",span=.25) + geom_rect(aes(xmin=22.5,xmax=35,ymin=4,ymax=5),fill="white")
g + geom_point(shape=15) +
geom_smooth(aes(fill=gear)) +
geom_rect(aes(xmin=22.5,xmax=35,ymin=4,ymax=5),fill="white")+
annotate("text", x=30,y=4.5,label="Lots of Regresion Lines") +
labs(title="Simple ggplot: Mtcars dataset")
g + geom_point(shape=15) + geom_smooth() + coord_flip() + facet_wrap(~gear) + labs(title="Simple faceted ggplot: Mtcars dataset")
pima %>% ggplot(.,aes(x=Plasma.glucose, fill=Class)) + geom_histogram(binwidth=20,colour='black')
pima %>% ggplot(.,aes(x=Plasma.glucose, fill=Class)) + geom_histogram(binwidth=30,colour='black') + facet_grid(Class~pregnancy.count)
pima %>% ggplot(.,aes(x=Plasma.glucose, fill=Class)) +
geom_histogram(binwidth=30,colour='black') +
facet_grid(Class~pregnancy.count) +
labs(title="Plasma Glucose Concentration by Pregnancy Count",
subtitle="Data=Pima Indian Diabetes") +
theme(axis.text.x = element_text(size=5,angle=90),
plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
pima %>% ggplot(.,aes(x=pregnancy.count,y=Plasma.glucose)) + geom_boxplot(notch=TRUE)
pg.median <- pima[,2] %>% median() pima %>% ggplot(.,aes(x=pregnancy.count,y=Plasma.glucose)) + geom_boxplot(fill='lightblue',alpha=0.4,notch=FALSE)+ geom_hline(yintercept = pg.median,colour='red',linetype="dashed") + stat_summary(fun.y = "mean", geom = "point", size= 1.3,shape=17,colour="red4")
pima %>% ggplot(.,aes(x=pregnancy.count,y=Plasma.glucose)) +
geom_boxplot(fill='lightblue',alpha=0.4,notch=FALSE)+
geom_hline(yintercept = pg.median,colour='red4',linetype="dashed") +
stat_summary(fun.y = "mean", geom = "point", size= 1.5,shape=17,colour="red")+
theme_bw() +
labs(title="Plasma Glucose Boxplot by Pregnancy Count",
subtitle="Data=Pima Indian Diabetes; Overall Plasma Glucose Median=117; Triangle indicates subset mean")
D1 <- diamonds %>% ggplot(aes(x=carat, y=price)) + geom_point() D2 <- diamonds %>% ggplot(aes(x=carat, y=price)) + geom_point(alpha=0.01) D3 <- diamonds %>% ggplot(aes(x=carat, y=price, colour=cut)) + geom_point() D4 <- diamonds %>% ggplot(aes(x=carat, y=price, colour=cut)) + geom_point(alpha=0.25) D1+D2+D3+D4
D5 <- diamonds %>% ggplot(aes(x=carat, y=price, colour=cut)) + geom_hex() D6 <- diamonds %>% ggplot(aes(x=carat, y=price, colour=cut)) + geom_hex(alpha=0.25) D5+D6
A grammar should not limit your thinking.
Thus ggplot2 is extendable to embrace other types of graphics.
Described & decomposed in a consistant way.
http://www.ggplot2-exts.org/gallery/
Two examples;
ggraph (graph nodes)
ggQC (control charts)
graph <- graph_from_data_frame(highschool) # require library(ggraph) & library(igraph)
G1 <- ggraph(graph) +
geom_edge_link(aes(colour = factor(year))) +
geom_node_point() +
labs(title="Nodes")
G2 <- ggraph(graph, layout = 'linear') +
geom_edge_arc(aes(colour = factor(year))) +
labs(title="Arc")
G1+G2 # using patchwork for layout
set.seed(5555) #library(ggQC)
Process1 <- data.frame(processID = as.factor(rep(1,30)),
metric_value = rnorm(30,0,1),
subgroup_sample=rep(letters[1:10], each = 3),
Process_run_id = 1:30)
ggXbarR <- ggplot(Process1, aes(x=subgroup_sample, y = metric_value, group=1))
ggXbarR + stat_summary(fun.y = "mean", colour = "black", geom = c("line")) +
stat_summary(fun.y = "mean", colour = "black", size = 2, geom = c("point")) +
stat_QC(method = "xBar.rBar") +
stat_QC_labels(method = "xBar.rBar", digits = 2)
Recall graph vs graphic definition.
The Assembly stage parallels the specifications.
The rendering of the graph and its aesthetic attributes to a final display system graphic.
Rendering may not just be 2 dimensional computer screen…
Not long ago rendering traditionally was paper.
What are rending issues with augmented/virtual reality displays?
Moving away from static graphics.
With advances in technology - many want to interact with data.
Interaction should be intuitive.
How do you specify a human interaction experience?
Animations, interactive data?
Some solutions -> ggvis, ggraptR, animations, plotly, rbokeh
Simple interactive plot - one function converts ggplot object.
Filtering discrete data, navigating tols
library(plotly) attach(msleep) ms1 <- ggplot(msleep,aes(x=awake,colour=vore)) + geom_histogram() ggplotly(ms1)
ggplot style syntax & layers. Uses the term "glyph" for components.
Hover tool extracts non-visualised data values.
library(rbokeh)
elements %>%
figure(width = 600, height = 250) %>%
ly_points(x=atomic.mass,y=atomic.number,
size=7,
color=CPK,
hover=list(symbol, name, CPK))Linking Plots; tools specified
tools <- c("pan", "wheel_zoom", "box_zoom", "box_select", "reset")
p1 <- figure(tools = tools, width = 400, height = 300) %>%
ly_points(Sepal.Length, Sepal.Width, data = iris, color = Species)
p2 <- figure(tools = tools, width = 400, height = 300) %>%
ly_points(Petal.Length, Petal.Width, data = iris, color = Species)
grid_plot(list(p1, p2), same_axes = TRUE, link_data = TRUE)
Â
Â
Grammar provides a common language.
Grammar helps understand, discuss and think deeply about graphs and their components.
Build with Layers
Aesthetics (aes) - mapping space but also perceptual eg colour
PDF of slides will be made available.
How would you describe?
Imagine a different co-ordinate system.
library(circlize)
# Class vs pregnancy Count
circos.clear()
pima2=pima[,c(9,1,2)]
circos.par("gap.degree" = c(5,15,rep(4,times=16),15))
chordDiagram(pima2)
circos.clear()